After a busy week at work why not head to New York for a great weekend! But because there are so many options in New York, choosing where to stay and where to go for the night can sometimes be a difficult decision. Therefore, this analysis aims to analyze and visualize the data of Airbnb and bars in New York to provide a guide for future visitors to New York.
The dataset I would use in the project includes following: 1. New York City Airbnb Open Data. The dataset includes the features, prices, and location of the room. It will be the main dataset that for final Airbnb selection. 2. 2016 Parties in New York. The dataset includes the Location of the bar and the number of noise record for the bars. It will identify the number of entertainment venues in the vicinity of each site and the likely noise levels. 3. Uber picks up in New York City. The dataset includes the the pick up location and time of Uber. This dataset demonstrates the ease of travel behavior. 4. Census Data. The data will include the basic regional unit for discussion in the new step suggestion for visitors to choose their preferred airbnb.
Code
%env MYPATH=C:/Folder Name/file.txtimport pandas as pdimport osimport numpy as npimport geopandas as gpdfrom shapely.geometry import Pointimport foliumimport xyzservicesimport panel as pnimport datetimeimport timeimport seaborn as snsfrom matplotlib import pyplot as pltimport holoviews as hvimport hvplot.pandasimport contextily as ctximport geoviews as gvimport geoviews.tile_sources as gvtsimport altair as altfrom sklearn.cluster import KMeansfrom sklearn.preprocessing import MinMaxScaler, RobustScalerfrom sklearn.preprocessing import StandardScalerimport requestsfrom sodapy import Socrataimport missingno as msnofrom scipy.stats import gaussian_kdeimport osmnx as oximport foliumimport altair as altfrom wordcloud import WordCloud
To better understand the airbnb situation in New York, I firstly visualized the statistical distribution of Airbnb data. We were able to find a distribution of prices that, with the exception of some of the higher-priced listings, was close to a normal distribution for most of the homes, with a concentration in the $20-$500 a night range. As for the number of reviews, We were able to find that the vast majority of listings received lower reviews. When it comes to room type, the ‘entire room’ and ‘private room’ took the major part. What’s more, most home located in Manhattan and Brooklyn, which can be explained by the fact that Brooklyn and Manhattan have most of New York’s places to hang out.
To better understand Airbnb’s geographic distribution patterns on New York, we used map visualizations for further analysis. The spatial distribution of locations shows that the density of listings gradually decreases in all directions, centered on Manhattan, while prices show an accumulation of higher prices at the center of density. Looking at average home prices in the greater region, Manhattan and Brooklyn have the first and second highest.
m = bdry_price.explore(column="price", scheme="FisherJenks")m
Make this Notebook Trusted to load map: File -> Trust Notebook
And when we look at the distribution of prices and reviews for Audemars Piguet at a smaller neighborhood scale, we are able to see that Audemars Piguet’s prices are also gradually decreasing in all directions, centered on Upper Manhattan. As for the reviews of the listings, we were able to find that the number of reviews in the higher priced areas is relatively low. The lower priced areas have a higher number of reviews as well as a higher average number of reviews per month. This can be explained by the fact that higher priced homes have a relatively smaller audience and less affordable people.
Code
m = census_prm.explore(column="number_of_reviews", scheme="FisherJenks")m
Make this Notebook Trusted to load map: File -> Trust Notebook
Code
m = census_prm.explore(column="price", scheme="FisherJenks")m
Make this Notebook Trusted to load map: File -> Trust Notebook
Overall, the price of a listing and more comprehensive information is overall not available at the same time. Those who are not price-sensitive have more options located in Mankato as well as Northwest Brooklyn. For a more cost-effective and comprehensive option, consider listings in the Bronx and Queens!
Where Can I have fun? – Entertainment Analysis & Visualization
In addition to accommodation options, we also paid equal attention to what ‘post-fun’ places there are to choose from in New York outside of everyday play. So why not go to a bar? From the point of view of the distribution of bars in New York, most of the bars are still concentrated in Manhattan, where the most prosperous business activities and nightlife in New York. However, when we look at complaints, we find that the average number of complaints received by bars in areas other than the Bronx is about the same, with Queens having the highest average, which is probably due to the fact that Queens itself is a large neighborhood. And as noisy areas are accompanied by disturbances at night, areas near bars with high complaints should be avoided as much as possible when choosing an Airbnb.
m = cen_bar.explore(column="num_calls", scheme="FisherJenks")m
Make this Notebook Trusted to load map: File -> Trust Notebook
Code
m = cen_bar.explore(column="City", scheme="FisherJenks")m
Make this Notebook Trusted to load map: File -> Trust Notebook
And when we looked at where and when parties were held in New York, we were able to see that the most parties were held in residential buildings, and the vast majority of parties ended between nighttime hours and 5 a.m. the next day, which means that most of the venues where parties took place probably weren’t a good choice for an Airbnb location!
How to Choose My Airbnb – Evaluation Data Modeling
After getting an overview of Airbnb and bar entertainment in New York, we wanted to model the choices for different visitor needs. First, we modeled the behavior of a tourist who wants an Airbnb that is not too expensive, does not have too many loud bars or too many parties in the vicinity of his home, and has a certain number of bars or parties within a certain distance to ensure that he can have a ‘last drink before going home’ in the vicinity of his home. Based on such needs, we constructed two levels of data.
The first is based on the ‘residential parameters’ of neighborhood as a unit:
the average ending time of parties within a certain radius of the residence;
the average number of complaints filed against bars within a certain radius of the residence;
and the second is based on the ‘recreational parameters’ of neighborhood as a unit “:
the number of bars within a certain range around the residence;
the number of Uber pickups around the bars (since the density of the distribution of vehicles varies during part of the night, taking into account Uber’s own algorithms for the scheduling of the vehicles, the availability of enough cars in the area around the bars is also a factor to be weighed)
The final element for judging the community consists of the following variables.
e.g.Initially, it was intended to use Airbnb rooms as the unit of discussion, but in the subsequent construction of the metrics, it was found that when the dataset is too large using the buffer to replenish the points around the coordinates of each point can easily cause the program to crash, so in the subsequent improvement, if this problem can be solved, it will be able to better go to the selection of the room.
Based on the constructed metrics, we first perform a cluster analysis to find out if the metrics have any geographical commonalities. For model selection, we use the Kmeans model. We analyze this from three perspectives-living, playing, and traveling. In the previous analysis, we already have a preliminary knowledge of the three dimensions, so in the cluster analysis, we try to conduct a cross analysis of the combination of dimensions to obtain the choice of different needs.
Where Can I Find a Bar? – Clustering Visualization
Based on cluster analysis, we are able to provide travelers with Airbnb location options that meet their needs for different needs. For example, if our traveler is a person who does not require a lot of accommodation but wants to go to a bar in the evening and wants to take a taxi home quickly, we analyzed the clustering of bar and Uber related metrics and found that areas with label 3 and 4 are very suitable for the location where he/she is going to visit.
# setup the figuref, ax = plt.subplots(figsize=(10, 8))# plot, coloring by label column# specify categorical data and add legendpbua_nb_gdf.plot( column="travel_bar", cmap="Dark2", categorical=True, legend=True, edgecolor="k", lw=0.5, ax=ax,)ax.set_axis_off()plt.axis("equal");
Where Can I Find A Quite Airbnb? – Clustering Visualization
Let’s take another example. Emily wants to spend a nice weekend in New York City, but she wants to avoid too many bars in the neighborhood because they are loud and potentially dangerous. On the other hand, Emily doesn’t have a big budget, so she doesn’t want to spend too much money on Airbnb. From the cluster analysis of ‘Airbnb-Bar’, we can find that the area represented by cluster 2 meets Emily’s needs. Overall, from the map, the intersection of Bronx Grove and Queens would be a great residential option for Emily.
# setup the figuref, ax = plt.subplots(figsize=(10, 8))# plot, coloring by label column# specify categorical data and add legendpbua_nb_gdf.plot( column="bar_air", cmap="Dark2", categorical=True, legend=True, edgecolor="k", lw=0.5, ax=ax,)ax.set_axis_off()plt.axis("equal");
Interactive Airbnb Location Selection Tools
And similarly, we provide an interactive map of the specific locations of Airbnb’s in the community, where visitors can see the specific prices and number and frequency of reviews of Airbnb’s in their preferred neighborhood, which can further help them make the right choice for them.
def filter_by_neighborhood(data, neighborhood_name): sel = data["ntaname"] == neighborhood_namereturn data.loc[sel]def airbnb_data(data, neighborhood_name): sel = nbhd_gdf["ntaname"] == neighborhood_name hood_geo = nbhd_gdf.loc[sel] m = hood_geo.explore( style_kwds={"weight": 4, "color": "black", "fillColor": "none"}, name="Neighborhood boundary", tiles=xyzservices.providers.CartoDB.Voyager, ) data.explore( m=m, # Add to the existing map! marker_kwds={"radius": 7, "fill": True, "color": "crimson"}, marker_type="circle_marker", # or 'marker' or 'circle' name="Tickets", )return mdef create_dashboard_1(neighborhood_name): tickets = filter_by_neighborhood(air, neighborhood_name) m = airbnb_data(tickets, neighborhood_name)return pn.pane.plot.Folium(m, height=600)
Code
ticket_dashboard_1 = pn.Column( pn.Column("## Airbnb in Your Neighborhood", neighborhoodSelect),# Add a height spacer pn.Spacer(height=45),# Bottom: the main chart, bind widgets to the function pn.bind(create_dashboard_1, neighborhood_name=neighborhoodSelect),)ticket_dashboard_1
General Feature of Airbnb in One Neighborhood
Based on the above analysis, we have been able to provide different tourists with the range of Airbnb’s they need for their choice of community. But again, the generalization about Airbnb’s within such a community is something we would like to describe to visitors. Therefore, we have selected the names of Airbnb within the community range (as they contain some attractive features of the listings) for word cloud analysis to get the common features of Airbnb within a community to help the tourists in further screening.
temp = pn.Column( pn.Column("## Airbnb in Your Neighborhood", neighborhoodSelect),# Add a height spacer pn.Spacer(height=45),# Bottom: the main chart, bind widgets to the function pn.bind(wcloud, name=neighborhoodSelect),)temp